Production of Phrase Tables in 11 European Languages using an Improved Sub-sentential Aligner
نویسندگان
چکیده
This paper is a partial report of an on-going Kakenhi project which aims to improve sub-sentential alignment and release multilingual syntactic patterns for statistical and example-based machine translation. Here we focus on improving a sub-sentential aligner which is an instance of the association approach. Phrase table is not only an essential component in the machine translation systems but also an important resource for research and usage in other domains. As part of this project, all phrase tables produced in the experiments will also be made freely available.
منابع مشابه
Fast BTG-Forest-Based Hierarchical Sub-sentential Alignment
In this paper, we propose a novel BTGforest-based alignment method. Based on a fast unsupervised initialization of parameters using variational IBM models, we synchronously parse parallel sentences top-down and align hierarchically under the constraint of BTG. Our twostep method can achieve the same run-time and comparable translation performance as fast align while it yields smaller phrase tab...
متن کاملA Chunk-Driven Bootstrapping Approach to Extracting Translation Patterns
We present a linguistically-motivated sub-sentential alignment system that extends the intersected IBM Model 4 word alignments. The alignment system is chunk-driven and requires only shallow linguistic processing tools for the source and the target languages, i.e. part-ofspeech taggers and chunkers. We conceive the sub-sentential aligner as a cascaded model consisting of two phases. In the firs...
متن کاملRobust Language Pair-Independent Sub-Tree Alignment
Data-driven approaches to machine translation (MT) achieve state-of-the-art results. Many syntax-aware approaches, such as ExampleBased MT and Data-Oriented Translation, make use of tree pairs aligned at sub-sentential level. Obtaining sub-sentential alignments manually is time-consuming and error-prone, and requires expert knowledge of both source and target languages. We propose a novel, lang...
متن کاملCapturing Translational Divergences with a Statistical Tree-to-Tree Aligner
Parallel treebanks, which comprise paired source-target parse trees aligned at sub-sentential level, could be useful for many applications, particularly data-driven machine translation. In this paper, we focus on how translational divergences are captured within a parallel treebank using a fully automatic statistical tree-to-tree aligner. We observe that while the algorithm performs well at the...
متن کاملBilingual phrase-to-phrase alignment for arbitrarily-small datasets
This paper presents a novel system for sub-sentential alignment of bilingual sentence pairs, however few, using readily-available machine-readable bilingual dictionaries. Performance is evaluated against an existing gold-standard parallel corpus where word alignments are annotated, showing results that are a considerable improvement on a comparable system and on GIZA++ performance for the same ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014